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1.    Coarse  grained  Designs 

1.1.  Subclassifications 

A.  Procedurally  Oriented,  Omega  Network  or  Cube  Design 

A.].   Packet  Switching 
A. 2.    Circuit  Switching 

B.  Dataflow 

C .  Tree  Structured  Machines 

D.  Nearest  Neighbor  Machines 

E.  Crossbar  Designs 

F.  Ring  Structured  Machines 

G.  Bus  Structured  Machines 

H .   Miscellaneous  and  Eclectic  Designs 

1.2.  Detailed  Descriptions 

A.    Procedurally  Oriented,  Omega  Network  or  Cube  Design 
A.l .   Packet  Switching 

(1)  Gottlieb/Grishman/Kruskal...  (NYU)  Ultracomputer  1983 
Combining  Network,  Fetch-and-add  coordination 

(2)  Kuck/GajskiA.awrie...  (U.Ill.,  Urbana)  CEDAR  1982 

Local   clusters   of   8    processors   with   crossbar   interconnect   treated    as 
smallest  assignable  execution  unit. 

(3)  Lindstrom/Bames   (Burroughs  Corp)  FMP 

Packet     communication      handled      by      short     circuit-switch      phases; 
supplemented  global-or  net;   no  combining  of  requests.  In  other  respects 


resembles  NYU  Ultracomputer. 

(4)  Rettberg/Kraley   (BBN  Corp.)    Butterfly    1979 

MC68000-based  MIMD  system  with  memory  and  processor  association. 
Network  supports  block  transfers  and  interrupt  requests.  Seems  to  be 
logically  circuit  rather  than  packet  switched  and  involve  no  buffering  or 
synchronization  on  switches.  Memory  not  interleaved,  so  ultracomputer 
type  coordination  is  impossible.  Fetch-and-add  provided  at  software 
level. 

(5)  Smith   (Denelcor)   HEP  1980 

Powerful  individual  processors  timesliced  to  match  memory  latency. 

(6)  Seitz/Locanthe/Fox...  (Caltech)   Homogeneous  Machine  1972 
Hypercube    interconnect    explicitly    visible    to    progrzunmer;    message 
transmission   along   single  edge  is  basic  interprocessor  communication 
step. 

(7)  Sullivan/Baskow   (Sullivan  Associates)   CHOPP  1978 

Hypercube  interconnect  used  in  earlier  version,  may  go  to  omega  net 
(proprietary). 

(8)  Briggs/Fu/Hwang...  (Rice  U.  and  Purdue)  PUMPS  1982 

Vaguely  fleshed-out  proposal  for  parallel  processor  with  supplementary 
special-purpose  chips,  including  some  for  image  processing. 

A. 2.    Circuit  Switching 

(1)    Browne/LipovskiTTripathi...  (U.Texas,  Austin)  TRAC  1980 

Circuit  switching  design  with  some  packet  switching  capabilities.  The 
circuit  switching  is  used  to  achieve  "reconfigurability",  i.e.,  assignment 
of  memory  and  processing  power  to  tasks,  or  even  fine  grained 
computational  resources  such  as  byte-wide  adders,  which  can  be  linked 
together  with  longer  adders. 

B.    Dataflow 

(1)  Dennis/Misunas   (MIT)  Static  Dataflow  1975-80 

(2)  Arvind   (MIT)   Tagged  Token  Dataflow  1980 

Bus-structured  system  proposed  in  initizd  Irvine  variant.  As  compared  to 
basic  Dennis  ideas,  additional  "tags"  allow  dynamic  loop  unrolling  while 
maintaining  proper  dependency  relationships  between  operations. 
Attempts  to  keep  computation  within  a  single  processing  element. 
Hardware  support  for  "I  structures"  (aggregates). 

(3)  Davis/Dongrawski   (U.Utah)  DDMl  1975 

Tree-structured  machine  dependency  on  locality  of  virtual  dataflow 
processing;   more  layout  to  attziin  efficiency. 

(4)  Hogenauer/New bold/Inn   (TRW  Corp)  DD  DDSP  1982 
32-processor  configuration  organized  into  "subgroups"  of  2,  "clusters" 
of  8  on  several  levels  of  buses.    Executes  binary  operations  in  dataflow 
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fashion  using  common  associative  "matching  store"  to  dispatch 
completed  sets  of  operands.    Supports  up  to  2000  virtual  dataflow  nodes. 

(5)  Sowa/Murata   (U.  Illinois,  Chicago)    Dataflow 

"Associative  multiported  memory"  used  to  achieve  ready-packet 
dispatching.  (Note:  This  suggestion  is  not  usuable  for  large  dataflow 
systems.) 

(6)  Sauber/Cornish   (Texas  Instruments)  TI  Dataflow  Machine  1980 
Never  seems  to  have  got  past  testbed  status. 

(7)  GurdAVatson/Glauer   (Univ.  Manchester)  Manchester  Dataflow  1981 
Ring-structured  dataflow  with  20  processors. 

(8)  KishiA'asuhara/Kawamura   (OKI)  DDMP 

Supports  loop  unrolling  similar  to  Tagged-Token  architecture. 

(9)  Takahashi/Amamiya   (NTT)   Dataflow  Processing  Array 
Dataflow  processors  arranged  in  2D  grid. 

C.  Tree  Structured  Machines 

(1)  Stolfo   (Columbia  U.)  DADO 

MIMD/SIMD  tree  machine,  in  which  any  binary  subtree  able  to  execute 
in  SIMD  mode. 

(2)  Goodman/Despain   (U.C.  Berkeley)    XTREE  1978 

Tree  machine  with  additional  perfect  shuffle  connections  between  the 
leaves. 

(3)  Seguin/Goodman   (U.C.  Berkeley)    Hypertree 

Tree  machine  with  additional  cube  connections  between  nodes  of  each 
level. 

(4)  Keller/Lindstrom/Patil   (U.Utah)    AMPS  1979 

Tree  with  processing  elements  at  leaves  only  and  intermediate  nodes 
specialized  for  communication. 

(5)  Song   (CMU)   Tree  Machine  1980 

Tree  machine  with  memory  storage  at  nodes,  retrieval  requests  are 
broadcast  in  one  tree  of  processors  and  results  are  combined  in  an 
inverted  tree  sharing  the  same  nodes. 

(6)  Shin/Lee/Sasidar   (RPI)   HM^P   1982 

Tree-like  machine  based  on  "clusters"  which  share  a  common  bus,  and 
shared  on-bus  memory;   these  "clusters"  are  then  organized  into  a  tree. 

(7)  Mago   (U.  North  Carolina) 

Reduction-oriented  tree  machine.  Binary  tree  machine  with  additional 
interconnects  between  siblings. 

D.  Nearest  Neighbor  Machines 

(1)    Slotnick   (U.  Illinois)   ILLIAC  IV  1970 

The  classic  nearest  neighbor  rectangular  array. 
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(2)  HalsteadAVaxd   (MIT)    MUNET  19980 

Nearest  neighbor  array,  not  necessarily  rectangular,  with  memories 
intermediate  between  processors. 

(3)  Jordan/Storalsi/Pratt   (NASA-Langley)   Finite  Element  Machine 
Nearest  neighbors  including  diagonal  interconnect  with  supplementary 
busses;     circuit-switching    reconfigurability    concept    in    which    finite- 
element  mesh  is  mapped  directly  to  configured  processor  array. 

(4)  Kung/Aran/Gal/Ezer...  (U.S.C.)    Wavefront  Array  Processor  1982 
Nearest  neighbor  SIMD  array  with  systolic  usage  concept. 

(5)  Hoshino/Kawai/Shirakawa   (Tsukuba  U.)   PACS  1983 
Ordinary,  32  processor  nearest  neighbor  array. 

(6)  Brooks/Fox/Gupta...  (Caltech)   Cosmic  Cube  —  NNCP  1981 
4x4  nearest  neighbor  MIMD  array,  based  on  8086/67. 

E.  Crossbar  Designs 

(1)  de  Witt   (U.  Wisconsin)   Database  Machine 

Low  degree  of  parallelism,  intended  for  database  searches. 

(2)  (LLL)   S-1  1978 

High-speed  individual  processors,  low  degree  of  parallelism. 

(3)  Buehrer/Brandietz/Benz...  (E.T.H.,  Zurich)    EMPRESS  1982 
17-processor    crossbar    design,    with    one    processor    specializing     as 
"supervisor." 

(4)  Villemin   (Comp.  Sci.  Dept.,  CNAM,  Paris)   SERFRE  1982 

Vaguely  described  proposal  involving  hierachy  of  crossbars,  to  realize 
multi-descendant  tree  with  crossbar  communication  between  groups  of 
siblings. 

(5)  Trujillo   (LASL)    Multimicroprocessor  1981 

20  microprocessors  communicating  on  a  crossbar  switch. 

F.  Ring  Structured  Machines 

(1)    Minker/Rieger/Bare...    (U.  Maryland)   ZMOB  1980 

256  microprocessors  in  ring  configuration;  messages  reviewed  by 
interrupt;  send  to  specific  processor,  to  all,  etc.  supported.  Parallel 
PROLOG  application  being  developed. 

G.  Bus  Structured  Machines 

(1)  Taylor   (ELXSI  Corp.)    ELXSI  1981 

Up  to  16  4-MIP  ECL  processors  organized  around  very  high-speed  bus. 

(2)  Cordonnier/Mossu   (Univ.  Lille)     MAP  1981 
16  microprocessor  shared  bus  configuration. 

(3)  Davidson   (U.  Ill)   AMP-1  1980 

Small  shared  bus  multiprocessor  system. 
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(4)  Manner   (Univ.  Heidelberg)    Polyp  1982 
Bus  structured  multimicroprocessor  system. 

(5)  Dimopoulous    (Concordia  Univ.,  Canada)    Homogeneous  Multiprocessor 
1983 

A  multiple-bus  multimicroprocessor  configuration  (a  small,  apparently 
paper,  proposal). 

(6)  Guzman    (U.  of  Mexico)    Parallel  Hetrarchical  Machine   1980 

Bus  structured  multi-LISP  machine  configuration,  intended  for  execution 
of  parallel  LISP. 

Many  designs  of  this  general  class  have  begun  to  appear  lately,  eg.  FLEX32  a 
2D  lattice  of  VME  buses. 

H .    Miscellaneous  and  Eclectic  Designs 

(1)  Siegel/KemmererAVashbura   (Purdue)   PASM  1980 

A  "partitionable"  SIMD  shuffle-based  machine  (other  high  performance 
networks  are  also  being  considered),  in  which  the  communication  net  can 
be  decomposed  into  portions  of  size  2**m  operating  under  control  of 
many  standard  processors.  Note  that  remaining  shuffle  connections  they 
provide  for  communication  between  supervisors.  Global  "or"  among 
controllers  also  provided. 

(2)  Arden/Ginosar     (Princeton)  MP/C 

Processors  and  memories  arranged  linearly,  with  switches  that  allow  the 
line  to  be  broken  arbitrarily  into  subsegments.  Only  one  processor 
active  in  each  subset  at  a  given  time. 

(3)  Bronson/Siegel   (Purdue  U.)     Parallel  Speech  Processor  1982 
Proposal  for  specialized  collection  of  parallel  machine  arranged  in  series; 
the  "acoustic  processor"  substage  is  to  consist  of  512  cube-connected 
MC68000S. 

(4)  Postel     (Intermetrics  Corp.)   Hybrid  Dataflow  System  1982 

Nearest  neighbor  interconnect  with  superimposed  tree,  and  all  nodes  at  a 
given  level  circularly  interconnected. 

(5)  MapesAVeaver/Logan     (LBL,  Berkeley)    MIDAS  1983 
Circuit-switched    attachment   of   memory    modules    to    processors    with 
several  separate  "clusters"   a  common  memory  is  also  available  to  all  the 
processors  within  a  cluster.    About  10  processors  form  a  cluster;  the 
clusters  are  then  organized  into  a  tree. 

(6)  Wah/Ma     (Purdue)    MANIP  1980 

Vaguely  fleshed-out  proposal  intended  for  parallel  branch-and-bound 
applications.  Crossbar  interconnected  within  clusters,  linear  cyclic 
connection  between  clusters. 

(7)  Treleaven/Mole     (Newcastle)     Multiprocessor  Reduction  Machine 

A    linear    array    of    processors    sharing    memory    used    for    processing 
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reduction  languages. 

2.    Finegrained  Designs 

2.1.    Subclassifications 

A.  Bitwise  cube  and  shuffle 

B.  Nearest  neighbor  bitwise  processing 

C.  Tree  Machines 

D.  Circuit  Switching  Reconfigurable 

E.  Systolic  special  purpose  chips 

1.1.   Detailed  Descriptions 

A.  Bitwise  Cube  and  Shuffle 

(1)  Hilles   (Thinking  Machines  Corp.)   Connection  Machine  1983 
Single-bit    processors,     16/chip;    hypercube    interconnect    with    50-bit 
message    packets    passed    in    overlap    manner    using    all    cube    edges 
simultaneously. 

(2)  Sussman   (MIT)    Connection  Machine  1981 
Earlier  version  of  Hilles  machine. 

(3)  Wagner   (Duke  U.)    Boolean  Vector  Machine  1982 

1-bit  processors  each  storing  a  vector  of  m  bits,  with  bitwise  operations 
and  single-bit  shuffle-neighbor  transport  operations.  (Note:  less 
specialized  hardware  than  connection  machine  may  penalize  most  useful 
macro  operations  especially  message  passing,  by  requiring  generalized 
treatment;  will  be  better  on,  say,  the  SIMD  operations.)  Very  small 
memory  (128  bits/PE)  proposed. 

B.  Nearest  Neighbor  Bitwise  Processing 

(1)  Batcher   (Goodyear  Aerospace)    STARAN  1972 

32  X  4096  bit-array  of  1-bit  processors,  SIMD  machine;  array  accessible 
either  by  row  or  by  column. 

(2)  Surprise   (Goodyear  Aerospace)   ASPRO  1981 

Design  intermediate  between  STARAN  and  MPP;  2048  single-bit 
processors,  memory  array  2048x4096,  accessible  either  by  rows  or 
columns. 

(3)  Batcher   (NASA-Goddard)   MPP  1980 

128x128  bitwise  processor  SIMD  array  with  global-or  capability. 

(4)  (ICL  Corp.)    DAP  1979 

64x64  bitwise  processor  SIMD  airray  with  global-or  capability. 
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C.  Tree  Machines 

(1)  Mead/Browning     (Caltech)   Tree  Machine 

4  bit  wide,  512  nibble  memory  processor  VLSI  tree  proposal. 

(2)  Shaw    (Columbia  U.)   NONVON 

Tree  structured  from  finegrained  processors  but  with  shuffle  connections 
and  more  substantial  processors  near  the  root  and  with  disk  controllers  at 
intermediate  levels;  SIMD,  except  that  each  disk  controller  can  operate 
autonomously,  for  SIMD/MIMD  effect. 

D.  Circuit  Switching  Reconfigurable 

(1)    Snyder     (Purdue)   Blue  Chip  1981 

Nearest  neighbor  with  diagonal  interconnect  between  "switching"  and 
"processing"  nodes;  wafer-scale  integration;  can  put  arbitrarily  many 
switching  rows  and  columns  between  processors. 

E.  Systolic  special  purpose  chips 

(1)    Kung...    (CMU)   Systolic  processors  1979 

An  entire  family  of  special  purpose  systolic  chips  for  various  numerical, 
signal  processing,  pattern  matching,  associative,  and  buffering  functions. 

3.   Micro-Overlapped  Serial  Processor  Designs 

(1)  Fisher      (Yale)   ELI 

Multiple  fast  functional  units,  dispatchable  up  to  16  instructions  at  a 
time.  Reliance  is  on  500  bit  long  horizontal  microinstructions  and 
automatic  compilation  of  effective  horizontal  microcode. 

(2)  Kuck/Stokes     (Burroughs  Corp.)    BSP  1975 

A  commercial  multi-functional  unit,  horizontally  microcoded  design. 

(3)  (CDC  Corp.)    AFP    1980 

A  commercial  multi-functional  unit,  horizontally  microcoded  design. 

(4)  Regna/McGraw     (LLL)    Piecewise  Dataflow    1983 

48  bit-wide  horizontal  microcode  with  vector,  several  scalar,  and 
fetch/store  unit.  Hardware  FIFOs  used  to  control  dispatching  of  ready 
instructions. 
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